55 research outputs found
Efficient Discovery of Ontology Functional Dependencies
Poor data quality has become a pervasive issue due to the increasing
complexity and size of modern datasets. Constraint based data cleaning
techniques rely on integrity constraints as a benchmark to identify and correct
errors. Data values that do not satisfy the given set of constraints are
flagged as dirty, and data updates are made to re-align the data and the
constraints. However, many errors often require user input to resolve due to
domain expertise defining specific terminology and relationships. For example,
in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be
captured in a pharmaceutical ontology. While functional dependencies (FDs) have
traditionally been used in existing data cleaning solutions to model syntactic
equivalence, they are not able to model broader relationships (e.g., is-a)
defined by an ontology. In this paper, we take a first step towards extending
the set of data quality constraints used in data cleaning by defining and
discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out
theoretical and practical foundations for OFDs, including a set of sound and
complete axioms, and a linear inference procedure. We then develop effective
algorithms for discovering OFDs, and a set of optimizations that efficiently
prune the search space. Our experimental evaluation using real data show the
scalability and accuracy of our algorithms.Comment: 12 page
Profiling relational data: a survey
Profiling data to determine metadata about a given dataset is an important and frequent activity of any IT professional and researcher and is necessary for various use-cases. It encompasses a vast array of methods to examine datasets and produce metadata. Among the simpler results are statistics, such as the number of null values and distinct values in a column, its data type, or the most frequent patterns of its data values. Metadata that are more difficult to compute involve multiple columns, namely correlations, unique column combinations, functional dependencies, and inclusion dependencies. Further techniques detect conditional properties of the dataset at hand. This survey provides a classification of data profiling tasks and comprehensively reviews the state of the art for each class. In addition, we review data profiling tools and systems from research and industry. We conclude with an outlook on the future of data profiling beyond traditional profiling tasks and beyond relational databases
TimeFabric: Trusted Time for Permissioned Blockchains
As the popularity of blockchains continues to rise, blockchain platforms must be enhanced to support new application needs. In this paper, we propose one such enhancement that is essential for financial applications and online marketplaces - support for time-based logic such as verifying deadlines or expiry dates and examining a time window of recent account activity. We present a lightweight solution to reach consensus on the current time without relying on external time oracles. Our solution assigns timestamps to blocks at transaction validation time and maintains a cache reflecting the effects of recent transactions. We implement a proof-of-concept prototype, called TimeFabric, in Hyperledger Fabric, a popular permissioned blockchain platform, and experimentally demonstrate high throughput and minimal overhead (approximately 3%) of maintaining trusted time. We also demonstrate a 2x performance improvement due to the cache, compared to reconstructing account histories from the ledger
Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media
We present the Multi-Modal Discussion Transformer (mDT), a novel multi-modal
graph-based transformer model for detecting hate speech in online social
networks. In contrast to traditional text-only methods, our approach to
labelling a comment as hate speech centers around the holistic analysis of text
and images. This is done by leveraging graph transformers to capture the
contextual relationships in the entire discussion that surrounds a comment,
with interwoven fusion layers to combine text and image embeddings instead of
processing different modalities separately. We compare the performance of our
model to baselines that only process text; we also conduct extensive ablation
studies. We conclude with future work for multimodal solutions to deliver
social value in online contexts, arguing that capturing a holistic view of a
conversation greatly advances the effort to detect anti-social behavior.Comment: Under Submissio
Statins Impair Antitumor Effects of Rituximab by Inducing Conformational Changes of CD20
Jakub Golab and colleagues found that statins significantly decrease rituximab-mediated complement-dependent cytotoxicity and antibody-dependent cellular cytotoxicity against B cell lymphoma cells
- …